4 research outputs found
Selecting and Generating Computational Meaning Representations for Short Texts
Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use SQL as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-SQL problem from three perspectivesâmethodology, systems, and applicationsâand show how each contributes to a fuller understanding of the task.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143967/1/cfdollak_1.pd
Sentence simplification, compression, and disaggregation for summarization of sophisticated documents
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134176/1/asi23576.pd
GVdoc: Graph-based Visual Document Classification
The robustness of a model for real-world deployment is decided by how well it
performs on unseen data and distinguishes between in-domain and out-of-domain
samples. Visual document classifiers have shown impressive performance on
in-distribution test sets. However, they tend to have a hard time correctly
classifying and differentiating out-of-distribution examples. Image-based
classifiers lack the text component, whereas multi-modality transformer-based
models face the token serialization problem in visual documents due to their
diverse layouts. They also require a lot of computing power during inference,
making them impractical for many real-world applications. We propose, GVdoc, a
graph-based document classification model that addresses both of these
challenges. Our approach generates a document graph based on its layout, and
then trains a graph neural network to learn node and graph embeddings. Through
experiments, we show that our model, even with fewer parameters, outperforms
state-of-the-art models on out-of-distribution data while retaining comparable
performance on the in-distribution test set